Rand’s whiteboard Friday post on flat architecture was both interesting and relevant as I recently launched a new web directory based on a flat architecture.
I’ve always admired the simplicity of Craigslist and wondered why web directories predominately favor categories over localities. During the process I discovered one fairly obvious reason why but more on that later. My intention in developing the site was to provide a directory that would be easily navigated by the engines—and obviously retain link juice—and to be perceived by potential customers as a logical and relevant place to list a business.
Consider these two URLs for Boston law firms, the first from the Yahoo! Directory and the second from Craigslist
- dir.yahoo.com/Regional/U_S__States/Massachusetts/Metropolitan_Areas/Boston_Metro/Business…
- boston.craigslist.com/lgs/
As Rand illustrated on the whiteboard, the shorter URLs are indexed more quickly and retain better link juice—not to mention being easier to read for the occasional human passerby. After checking around the web and seeing an opportunity to do better, I set out to create a web directory structured like Craigslist.
Before going more deeply into the architecture let me touch on the business basics, i.e., could a viable business result from this exercise?
I ruled out an ad based model for three reasons. 1) The internet has commoditized ads, requiring heavy traffic to make decent revenue numbers. A web directory is just not going see the traffic that a popular blog or newspaper will. 2) Secret revenue sharing formulas and a nasty recession have taught us that ad revenues can be unpredictable. 3) I wanted to keep the site design minimalist like Craigslist.
That left the paid submission model, which provides for a predictable cash flow needed for stability and growth.
So can money be made? Clearly it can. We have Yahoo! and Business.com charging $299 per year and Best of the Web ranging from $149 to $419 as well as some tier 2 brands like Joe Ant that charge under $50.
Other business requirements to proceed:
- Benefits the customer. Yes.
- Projected revenue can pay the bills. Yes.
- Have a unique sales proposition relative to the big guys. Yes.
- Have a marketing plan beyond “they will come.” Yes.
- Big competitors not likely to copy me and make life difficult. Yes.
With the feasibility check completed I connected with a web developer and got started by presenting her with my penciled diagram on a sheet of paper. That marked the start of what is now SiteBeSeen. Sally’s job was to build the directory, build the back end admin functions needed to review and access each listing, and to connect the site with the payment processor. All of this is done in php.
While she was programming I dusted off my copy of FrontPage 2003, created a dummy home page, uploaded to GoDaddy and linked to the home page from my blog. Might as well give the engines a heads-up and start the age clock ticking.
Two months went by and my blank index page made the toolbar radar with a Page Rank of 1. I was surprised by that and did a backlink check only to find that several websites were linked to the domain. I wondered how they could have found a site that was just 2 months old, didn’t have a single title keyword, other than the company name, and no content at all.
The answer is probably obvious to experienced hands; my domain had a past life. Someone operated it before me and picked up several back links. These links were 404s until I reactivated the site. After two years on a 404 list, I would have thought the engines would purge these dead links to reclaim a few bytes of hard drive space. Nope. Human zombies might be fiction but zombie links are out there my friends. (It’s okay if they get you though.) Too bad these weren’t NY Times zombie links.
The next phase was to get the site online and test it. The developer flipped the switch and it was live. She’s a talented programmer and built the site around a database using dynamic URLs. Sure, the engines can index dynamic URLs but my experience is that they do so at a glacial pace or they simply don’t bother. Either way, dynamics were a show-stopper for me. She wasn’t familiar with mod_rewrite so I sent her information on it and—after fighting and failing with GoDaddy’s implementation of it, then moving to a different host—she got it working.
Then one day a few weeks later: poof, no green in my toolbar and all the Google listings were gone. The duplicate content filter had found the site and sent it to Area 51; it didn’t exist.
Remember those URLs above? The advantage of a category-based directory is that you end up with unique pages, never any duplicates. If there are no ski shops in Miami then there is no ski shop category under Miami.
Locality-based sites like Craigslist and SiteBeSeen start with all the categories on one single page that is virtually identical for each city. It’s massive duplicate content and it’s not really feasible to make that page unique for each city. Every phone book, regardless of location, is going to have pretty much the same yellow pages categories; they all have auto repair, furniture stores, restaurants, etc.
Being structured like a phone book means we have the same issue. The only difference between the categories of a given city is the title tag and the heading. It is programmatically simple enough to display only the “inhabited” categories and thus make each city’s category page unique but doing so would have ugly visual/marketing affects so it’s not feasible.
I admit that I didn’t see the ban coming. My architectural role model is well indexed with great PR on all its duplicate pages. The difference is that CL has been around a while and SiteBeSeen, at that point, was a beta site with no history.
It’s a setback for sure to be an invisible web directory so I took a deep breath, went to Webmaster tools requested reconsideration. I explained the architecture to them and the block was removed. To this day I’m amazed that someone in Mountain View can enter a few keystrokes and instantly restore a site in the search results. That’s pretty cool.
Circling back to flat architecture, the way we are structured means that a listing is never more than 3 levels (clicks) from the home page and just 2 if you start from one of the listed metro areas. Category-based directories require many more clicks and often employ pagination within the category, which makes matters worse. The test becomes this: is more link juice lost by distributing it to many links on one page or is more lost though successive layers of clicking?
We already know the answer but let’s turn to the real world for an example. Compare the following two Yahoo! Directory categories, a US version and a Yahoo! Canada counterpart of the same URL.
US Version
dir.yahoo.com/Business_and_Economy/Business_to_Business/Financial_Services/Accounting/Firms/
This category page has 20 listings and then paginates. It is indexed by Google and has PR of 5. Assuming that PR imparts some link juice it is noteworthy that page 2 has no page rank and is cached but not indexed by Google. Google is aware of it but it has chosen not to index it and therefore it will not appear in the search results. And so it goes for all subsequent pages under this category. This dilution isn’t unique to the Yahoo! Directory, it’s the nature of a deep architecture.
By contrast, the Canadian version uses a different approach. All the listings—all 300 of them—appear on a single page.
Here’s the Canadian URL
ca.dir.yahoo.com/Business_and_Economy/Business_to_Business/Financial_Services/Accounting/Firms/
The difference is compelling. The page is indexed (again, Google) and the links count toward the rank of the listed sites. Basic SEO checks on the sites listed can confirm that. Team Canada takes the prize on this one. Note also that this link value difference has not been lost on American SEOs since a significant majority of the listings are American companies.
Wrapping it up, the advantages of flat architecture are faster indexing and better link juice retention; important factors in a web directory. The site also employs other SEO tactics such as relevant, auto-generated titles and basic page keywords for each category as well as Google and Bing XML sitemaps for finding the subdomains. I expect it will take a while to be thoroughly indexed by the engines but it is coming along with each Google update.
Feel free to ask questions in the comments section and if you found this post interesting I would appreciate a tweet to the post.
Dave Goodwin is the founder of SiteBeSeen.com, a web directory with a really flat architecture.